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Abstract 

I discuss the issue of uncertainties in parton distributions and in the physical 
quantities which are determined in terms of them. While there has been sig- 
nificant progress on the uncertainties associated with errors on experimental 
data, there are still outstanding questions. Also, I demonstrate that in many 
circumstances this source of errors may be less important than errors due to 
underlying assumptions in the fitting procedure and due to the incomplete na- 
ture of the theoretical calculations. 



1. Introduction to Global Fits 

The fundamental quantities one requires in the calculation of scattering processes involving hadronic 
particles are the parton distributions. These can be derived from and then used within QCD. Using the 
Factorization Theorem the cross-section for this process can be written in the factorized form 

a(ep ^eX)=J2 Cf(x, a s (Q 2 )) ® fox, Q 2 , a s (Q 2 )) (1) 

i 

up to corrections of order Aq CD /Q 2 , known as higher twist. The coefficient functions Cf (x, a s (Q 2 )) 
describing the hard scattering process are process dependent but are calculable as a power-series in the 
strong coupling constant a s {Q 2 ). 

Cf(x,a s (Q 2 )) = Y,C^\x) a k s (Q 2 ). (2) 

k 

The fi(x, Q 2 , a s (Q 2 )) are the parton distributions, i.e the probability of finding a parton of type i car- 
rying a fraction x of the momentum of the hadron. Because they depend on the nonperturbative way in 
which partons are bound into the hadron, these parton distributions are not calculable from first princi- 
ples. However, they do evolve with Q 2 in a perturbative manner 

^^Cfi^ = £ «^)) ® Q 2 ^s(Q 2 )) (3) 

i 

where the splitting functions Pij(x, Q 2 , a s (Q 2 )) are calculable order by order in perturbation theory. 
Since the parton distributions fi(x, Q 2 , a s (Q 2 )) are process-independent, i.e. universal, once they have 
been measured at one experiment, one can predict many other scattering processes. 

In order to determine the parton distributions one can use a range of available data - largely 
ep — > eX (structure functions), and the most up-to-date QCD calculations, which are currently NLO-in- 
a s (Q 2 ). (NNLO coefficient functions are known for some processes, e.g. structure functions, and NNLO 
splitting functions have considerable information, and may be known within a year or so.) Perturbation 
theory is assumed to be valid if a s (Q 2 ) < 0.3 so only data with Q 2 > 2GeV 2 or more are used. This 
cut should also remove the influence of higher twists. 

The global fit |jl|]-||8|] usually proceeds by starting the parton evolution at a low scale Qq ~ lGeV 2 , 
and evolving partons upwards using NLO DGLAP equations. In principle there are 1 1 different parton 
distributions to consider (Isospin symmetry is assumed, i.e. if p —> n, d(x) —> u(x) and u(x) — > d(x).) 

u,u, d,d, s,s, c,c, b,b, g. (4) 



In practice m c , m& S> Aqcd so the heavy parton distributions are determined perturbatively. Also it is 
currently assumed that s = s. The 6 independent parton sets are then 

uv = u — u, dy = d — d, sea = 2 * (u + d + s), d — u, g. (5) 



The input partons are parameterized in a particular form, e.g. 

xf(x, Ql) = A(l - xf{l + ex ' 5 + ~fx)x s . 
The partons are then constrained by a number of sum rules: 



(6) 



uy(x) dx 



dy(x) dx = 1 



xT,(x) + xg(x) dx = 1, 



(V) 



i.e. conservation of the number of valence quarks, and conservation of the momentum carried by partons. 
The latter is an important constraint on the form of the gluon which is only probed indirectly. 

In determining partons one needs to consider that not only are there 6 different combinations of 
partons, but there is also a wide distribution of x from 0.75 to 0.00003. One needs many different 
types of experiment for full determination. The full set of data usually used is HI and ZEUS i 7 ^ (x, Q 2 ) 
data @ |l^] which covers small x and a wide range of Q 2 ; E665 F^ ' d (x, Q 2 ) data JTl| ] at medium x; 



BCDMS and SLAC F%' d (x, Q 2 ) data []12|]-[]13|] at large x; NMC F 2 p,a (x, Q 2 ) fllj] at medium and large x; 



p,d. 



CCFR F% (x, Q 2 ) and F% (x, Q 2 ) data [ |l5| ] at large x which probe the singlet and valence quarks 
independently; ZEUS and HI F% chaTm {x, Q 2 ) data [jl6|, (TJ; E605 pN -> pp, + X Q] constraining 
the large x sea; E866 Drell-Yan asymmetry [^] which determines d — u; CDF W-asymmetry data [20] 
which constrains the u/d ratio at large x; CDF and DO inclusive jet data [fn|, 22] which tie down the 



high x gluon; and CCFR and NuTev Dimuon data [23, 24] which constrain the strange sea. Note that I 
discuss unpolarized parton distributions. There are far fewer data for polarized distributions, though fits 



with error determinations do exist, e.g. [25]. 



1.1 Quality of the Fit 

This is determined by the x 2 of the fit to data, which may be calculated in various ways. The simplest 
is to add statistical and systematic errors in quadrature. This ignores correlations between data points, 
but is sometimes quite effective. Also, the information on the data often means that only this method is 
available. 

However, more properly one uses the full covariance matrix which is constructed as 

n 

dj = 5ij(T 2 stat + ^ Pij a k,i^k,j, (8) 
k=l 

where k runs over each source of correlated systematic error and are the correlation coefficients. The 
X 2 is defined by 

N N 

X 2 = E E(A - T i {a))C^{D j - Tj(a)), (9) 
i=ij=i 

where N is the number of data points, Di is the measurement and Tj (a) is the theoretical prediction de- 
pending on parton input parameters a. Unfortunately this method relies on inverting very large matrices. 

An alternative which is identical to the correlation matrix definition of \ 2 if the errors are small is 
to incorporate the correlated errors into the theory prediction 



n 

fi(a, s) = Ti(a) + E SfcAjfc, 

k=l 



(10) 



where is the one-sigma correlated error for point i from source k. In this case the \ 2 is defined by 



X 



N 

E 



A - /i(a,s) 



2 " 

E 

^=1 



(11) 



where the second term constrains the values of assuming the correlated systematic errors are Gaussian 
distributed. In this method the data may move en masse relative to the theory. One can solve for the Sk 
analytically fl26|, 3). Defining 



N 



A ik (Di-Ti(a)) 



A 



kl 



Ski + 



N 

E 



1 ®i,unc 



one obtains 



This leads to the \ 2 definition 



dx 2 
ds k 



N 



Si (a) = Y^i^kiBi 



(12) 



(13) 



i=i 



x 2 = y^/(A - T i{ a )) 



i=l 



J2J2 b ^ a ^iBi 



(14) 



k=l 1=1 



This approach has the double advantage that smaller matrices need inverting and one sees explicitly the 
shift of data relative to theory. However, it is doubtful that Gaussian correlated errors are realistic. The 
method also allows one to move data simply to compensate for the shortcomings of theory. Indeed, 
MRST find that for HERA data increments in \ 2 using this method are the same as for adding in quadra- 
ture, and that the data move towards theory rather than vice versa [§]. Hence it is questionable in practice 
quite how much of an improvement this approach is in many cases. However, for Tevatron jet data, where 
correlated systematic errors dominate, a sophisticated treatment of correlated errors is essential. 

Using some particular method of calculating \ 2 tne global fit procedure completely determines 
partem distributions at present. In general the total fit is of reasonably good quality, as illustrated for 
the major data sets, and the CTEQ6 fit (which assumes as{M^) fixed at 0.118) in table 1. The total 
X 2 = 1954/1811. For MRST «s(Mf ) is determined to be 0.119, and the total x 2 = 2328/2097. 
However, the \ 2 P er point of more than one suggests some possible shortcomings, and it may be argued 
that there are some areas where the theory perhaps needs to be improved. 

A table of \ 2 versus no. of data points for the CTEQ6 fit. 



Data set 


No. of 
data pts 


x 2 


HI ep 


230 


228 


ZEUS ep 


229 


263 


BCDMS up 


339 


378 


BCDMS fid 


251 


280 


NMC up 


201 


305 


E605 (Drell-Yan) 


119 


95 


DO Jets 


90 


65 


CDF Jets 


33 


49 



2. Parton Uncertainties 

There are a number of different approaches for obtaining parton uncertainties. 



2.1 Hessian (Error Matrix) Approach 

This was first used by HI and has recently been extended by CTEQ. One defines the Hessian matrix by 



a[ 0) )(aj 



,(o), 



The Hessian matrix H is related to the covariance matrix of the parameters by 

C tJ (a)=A X \H- 1 ) tr 
We can then use the standard formula for linear error propagation: 



1-.3 



(15) 



(16) 



(17) 



This has been used to find partons with errors by HI [Q] and Alekhin each with restricted data sets. 
In practice it is problematic due to extreme variations in Ax 2 in different directions in parameter space. 

2-dim ( i,j) rendition of d-dim ( ~20) PDF parameter space 
contours of constant % 2 




global 

U/\ eigenvector in the l-direction 
p(z): point of largest a t with tolerance T 
(j) S : global minimum 



diagonalization and 

reseating by 
the iterative method 



Hessian eigenvector basis sets 




(b) 



Original parameter basis Orthononnal eigenvector basis 

Fig. 1: Representation of diagonalization of Hessian matrix. 

This is solved by finding and rescaling eigenvectors of H leading to the diagonal form 

AX 2 = E^- 



(18) 



The method has been implemented by CTEQ [E8L E7L Bp. The uncertainty on a physical quantity is 



(AF) 2 = ^(F(sJ +) )-^r } )) 



(19) 



where and S { ~ } are PDF sets displaced along eigenvector directions by the given A% 2 - There is 
uncertainty in choosing the "correct" Ax 2 (in principle one unit) given the complications of a full global 
fit. CTEQ choose Ax 2 ~ 100 [26]. A discussion of this problem is found in p9|]. 



2.2 The Offset Method. 

In this case the best fit is obtained by minimizing 



X 



N 

E 

i=l 



(20) 



i.e. the best fit and parameters ao are obtained by considering only uncorrected errors. This forces 
the theory to be close to unshifted data. The quality of the fit is then estimated by adding errors in 
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Fig. 2: Results of CTEQ Hessian approach for gluon uncertainty. 



quadrature. The systematic errors on the a\ are determined by letting each Sk 
deviations in quadrature. In practice one calculates 2 Hessian matrices 



M, 



ddjdcij 



V;, 



ddjdsi 



±1 and adding the 



(21) 



and defines covariance matrices 



C, 



stat 



M~ l C sys = M' 1 VV T M~ 1 



a 



tot 



C s tat + C, 



sys- 



(22) 



to achieve the same result. This was used in early HI fits [3C] and by ZEUS. A discussion and presen- 
tation of this method and of ZEUS results can be found in [31]. The offset method leads to a bigger 
uncertainty than the Hessian method for the same Ax 2 [32]. 



2.3 Statistical Approach H 

In this one constructs an ensemble of distributions labelled by T each with probability P{\T\). The 
mean fio and deviation go of observable O are then given by 

^o = E 0({T})P({T}), a 2 Q = ^(^(m) " Vo?P({F}). (23) 

While this is statistically correct, and does not rely on the approximation of linear propagation of errors 
in calculating observables, it is inefficient. In practice, one generates N pi » different distributions with 
unit weight but distributed according to where N p ^f can be made as small as 100. Then 

-i N P<lf 1 N pdf 

= T7- E a O = TT~ E {0{{T}) - Vof. (24) 

N pdf I Npdf I 

One can incorporate full information about measurements and their error correlations in the calculation 
ofP({^}). 

Currently the authors of ^ use only proton DIS data sets in order to avoid complicated uncertainty 
issues such as shadowing effects for nuclear targets. Using strict confidence limits they find it difficult to 
obtain consistency between many different DIS experiments. Also the lack of important data sets leads 
to "unusual" values for some parameters, which illustrates the importance of using a wide variety of data. 
However, fig. 3 shows that indeed the Gaussian approximation is often not good, and shows potential 
complications for the more simplistic approaches. This is a very attractive but ambitious large-scale 
project with a lot of work still to be done. 
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Fig. 3: One set of parton parameters obtained from The red curve is the Gaussian approximation and the blue line the 
MRST value. The green curve for as is the LEP result. 



2.4 Lagrange Multiplier 

One can look at the uncertainty on a given physical quantity using the Lagrange Multiplier method, first 



suggested by CTEQ [ J26[ ] and also used by MRST p3|,|34fl. One performs the global fit while constraining 
the value of some physical quantity, i.e. minimizing 



*(A,a)=x^ oW ( a ) + A F(a) 



(25) 



for various values of A. This gives the set of best fits for particular values of the parameter F(a) without 
relying on the Gaussian approximation for A% 2 - A useful example is the W cross-section at Tevatron 
which is illustrated in fig. 4. The uncertainty in a quantity is determined by deciding an allowed value of 
Ax 2 - 
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Fig. 4: Variation of aw with total \ 2 for the CTEQ fit. 



CTEQ use A% 2 = 100 (same as for the Hessian approach). They obtain for as = 0.118 ^ 
Aaiy(LHC) w ±4% Aa w (Tev) w ±5% 



Acth(LHC) » ±5%. 
The procedure is also used by MRST for a wider range of data, and using A% 2 



(26) 

50. They find that for 



a s = 0.119 [34] 



Aaiy(Tev) pa ±1.2% Ao^fLHC) « ±2% 

A(7 H {Tev) « ±4% A<t h (LHC) « ±2%. (27) 

If Q5 also varies, Acr^ is quite stable but Aajj almost doubles. The x 2 profile is shown in fig. 5. One 
can repeat for other processes, e.g. HERA charged current data are sensitive to very high x quarks, the 
Tevatron jet data is sensitive to high x gluon etc.. 

Overall one concludes that the uncertainty due to experimental errors is rather small, however they 
are dealt with. It only exceeds a few % for quantities related to the high x gluon or very high x quarks. 
However, there are other sources of error. 
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Fig. 5: Ax 2 -plot for W and Higgs production at the Tevatron and LHC with as free. Contours show increments of 50 in A\ 2 



3. Other Errors. 

To obtain a complete estimate of errors, one also needs to consider the effect of the assumptions made 
during the fit. These include the cuts made on the data, the data sets fit, the parameterization for the input 
sets, the form of strange sea, the assumption of no isospin violation, etc.. It is known that many of these 
can be as important as the experimental errors on data used (or even more so). A more systematic study 
is needed. 

It is also vital to consider sources of theoretical error. These include higher twist at low Q 2 and 
higher orders in as- The latter are due not only to NNLO corrections, but also to enhancements at 
large and small x because of terms of the form a™ ln n_1 (l/x) and a™ ln 2n_1 (l — x) in the perturbative 
expansion. This means that renormalization and factorization scale variation are not a reliable way of 
estimating higher order effects, e.g., at small x 



P 1 ~ 

qg 



whereas 



P 2 ~ 

99 



a s (^)\n n - 2 {l/x) 



a s (n 2 



P n rsj 

19 



(28) 



(29) 



and scale variations of P^ g , P 2 g never give an indication of these terms. Hence, in order to investigate the 



true theoretical error we must consider some way of performing correct large and small x resummations, 
and/or use what we already know about NNLO. The latter approach implies that some quantities may 



acquire large higher order corrections [35]. 

Alternatively, one can use the empirical approach of investigating in detail the effect of cuts on 
data. In order to investigate the real quality of the fits and the regions with potential problems we try 
changing W 2 ut , Q 2 ut and x cu t, re-fitting and seeing if the fit to the remaining data improves and/or the 



input parameters change dramatically [36]. (Similar to a previous suggestion in terms of data sets rather 
than region of parameter space _p7j].) This is continued until the fit quality and the partons stabilize. 

MRST(2001) NLO fit , x=0.02 - 0.08 MRST(2001) NLO fit , x= 0.0032 - 0.0175 
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Fig. 6: Comparison of MRST(2001) and a fit with x cut = 0.005. 

For W 2 ut raising from 12.5GeV 2 has no effect. Raising Q 2 cut from 2GeV 2 there is a slow continu- 
ous improvement for higher Q 2 up to > 12GeV 2 , suggesting higher order corrections may be important. 
The small x gluon decreases slightly as does as(M§) as Q 2 cut is raised. The predictions for most quan- 
tities remain quite stable. Raising x cu t from to 0.005 leads to continuous improvement - Ax 2 = 51 
for the data surviving the cut. The improvement in the fit to structure function data is shown in fig. 6, 
and the fit to Tevatron jet data also improves. For x cut = 0.005 there is much reduced tension between 
different data sets. The small x gluon (outside the range of the fit) decreases significantly, allowing it 
to increase for higher x, facilitating the improved fit. as(M^) falls slightly to 0.118. This result sug- 
gests that higher order corrections with large ln(l/x) terms could be significant below x = 0.005. With 
Xcut = 0.005 predictions for Tevatron cross-sections are still possible and there is a large change com- 
pared to the default fit, as seen in fig. 7. The new prediction is well outside the limit set by experimental 
errors, suggesting that the theory error may easily be dominant for these quantities. 



4. Conclusions 

One can perform global fits to data over a wide range of parameter space determining the partons very 
precisely. The fit quality is generally good, but there are some slight worries. There are various ways of 
looking at the uncertainties on partons due to errors on the data. Although there has been much progress 
recently, there is no universally preferred approach, each having strengths and weaknesses. The errors 
on partons and related quantities from this source are rather small, i.e. ~ 1 — 5%. 




However, the uncertainties from input assumptions e.g. cuts on data, parameterizations etc., are 
comparable and possibly larger. Also, the errors from higher orders corrections are potentially large, 
particularly in some regions of parameter space, and due to correlations between partons in different 
regions of phase space these feed into all regions (e.g. the small x gluon influences large x gluon). For 
some/many processes theory is probably the dominant source of uncertainty at present. Systematic study 
of assumption/theory errors is needed as well as studies of uncertainties due to errors. This is much 
harder, and is just beginning. 
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